-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add plaintext client as alternative to SSH. #198
base: master
Are you sure you want to change the base?
Conversation
This offers an alternative to the SSH client currently used during scale testing. This instead connects to a server using plaintext and sends commands this way. It is assumed that a server is running on the server on port 8000 Because ncat is a good example of such a server, the client is called NCatClient in this change. The reason for this new method is to eliminate the overhead of connecting over SSH from the measurements in tests. For reasonably large-scale tests (~300 nodes) it results in a bout a 7% reduction in execution time. One notable difference between this new method and using SSH is that connections are created per host, rather than per sandbox. This allows for clients that connect to sandboxes hosted on the same machine to share the same connection. This has only been tested with tests that execute serially. Tests that run scenarios in parallel may encounter issues. If there is no server listening, and a connection cannot be created, we fall back to using SSH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message mentioned: "One notable difference between this new method and using SSH is that connections are created per host, rather than per sandbox. ..." For my understanding the SSH connections are per host (farm node), too, not per sandbox. cc @l8huang
Otherwise LGTM. (Please rebase)
try: | ||
global NCAT_CLIENT_CACHE | ||
server = cred["host"] | ||
return NCAT_CLIENT_CACHE.setdefault(server, NCatClient(server)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a option for selecting client? It's better to keep ssh client as default instead of a fallback. There could be some security concerns about using plain text client in corp network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I can do that.
@hzhou8 correct, SSH connections are per farm node. @putnopvut Thanks for this performance improvement. Could you please add some documents about server-side settings for using this |
Sure, in the next revision, I'll add documentation about the server side. My understanding was that there could be multiple farm nodes per physical server. For instance, you might have 10 farm nodes implemented as containers on a single physical server. In other words, the credential['host'] and credential['password'] of all 10 farm nodes would be the same values. Each of these farm nodes would then use a separate SSH connection, even though they're all connecting to the same physical server. With the plaintext client, I've set it up so that those 10 farm nodes will share the same connection instead. This reduces the number of connections required during the test since all farm nodes can share the same connection. However, this has the drawback that if you are running parallel scenarios, this will cause test failures. Scenarios that are run in parallel can potentially receive messages intended for the other running scenarios. |
raise NCatError(details) | ||
|
||
def close(self): | ||
# Test scenarios call close after every operation because with SSH, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding this is not true for SSH. We don't close connections for every operation. SSH connection is cached for every farm node, which models a physical server. For each new operation, it doesn't recreate new SSH connection, but just reusing existing connection to the node. Connections are closed when the test is done and the test program exits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I have an explanation for this :)
This patch was originally made against https://github.com/dceara/ovn-scale-test/tree/ovn-switch-per-node-devel . When I discussed with Dumitru about pushing this change, we agreed to open a PR here first and then we could port it to that particular branch after.
On that branch, we have added close() calls in a lot of places. We ran into issues where tests would fail after a while. We could no longer open new SSH connections because the servers would reject incoming connections due to having too many open. We added close() calls after basically every SSH command. My comment here makes sense in the context of how that branch operates, but here it's just wrong. I'll correct it.
By the way, if you're wondering why we are running into this problem of too many SSH connections, it likely has to do with our different understanding of farm nodes. As I said in another comment, we configure our tests to have multiple farm nodes per physical machine. So the number of SSH connections can pile up more quickly than if each farm node maps to a single physical machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@putnopvut sorry for the confusion with the branches in my fork. Branch ovn-switch-per-node-devel is stale. What we use downstream is ovn-switch-per-node and in that branch I removed all the close()
calls because we addressed the issues we had, i.e., use runner type serial
instead of a single runner that would connect to all farm nodes. Like this only the connections used in one of the iteration of the runner will be open (and will get cleaned up when the iteration ends).
I'll try to open PRs for all the stuff that's only in my fork as soon as I get the chance. I hope that will clear things out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explain @putnopvut and @dceara
Sorry that we didn't thought about farm node that way. A farm node models a physical server, and each physical server runs N sandboxes or containers that simulate hypervisors. So in what use case would need multiple farm nodes on a same physical server? On the other hand, assume there is an use case, but how does ovn-scale-test tell which farm nodes are on same physical server so that they can share same connection? Can SSH use the same mechanism to share connections between multiple farm nodes on same physical server? Why would plaintext connection be different from SSH on that front? |
In our case, we are using ovn-fake-multinode to set up containers representing the farm nodes. In our rally deployment, we specify
The ovn_multihost context gathers data from the deployment (see get_ovn_multihost_info), including ssh credentials. By referencing this context data, we can tell if multiple farm nodes have the same ssh host. I think SSH could use this same mechanism if desired. My mindset with creating this patch was to focus on the plaintext client, not to add features to the SSH client. However, I can see how this could be frustrating to have two clients that operate differently for seemingly no reason. For my next iteration on this pull request, I'll make the plaintext client work the way SSH does now. I can make a later change to alter both SSH and plaintext to share connections between farm nodes if possible. |
Make sense! Thanks for the explanation. |
Sorry for the slow turnaround on this, but essentially I've been down a rabbit hole on this. When it was suggested to add configuration for plaintext vs. ssh, it got me to look into how the best way to configure it was. It looks like the deployment is the proper place to do it. And from there, I found out that I had missed some key uses of SSH in the code, and that SSH use is baked in more heavily than I had previously thought. |
This offers an alternative to the SSH client currently used during scale
testing.
This instead connects to a server using plaintext and sends commands
this way. It is assumed that a server is running on the server on port 8000
Because ncat is a good example of such a server, the client is called
NCatClient in this change.
The reason for this new method is to eliminate the overhead of
connecting over SSH from the measurements in tests. For reasonably
large-scale tests (~300 nodes) it results in a bout a 7% reduction in
execution time.
One notable difference between this new method and using SSH is that
connections are created per host, rather than per sandbox. This allows
for clients that connect to sandboxes hosted on the same machine to
share the same connection. This has only been tested with tests that
execute serially. Tests that run scenarios in parallel may encounter
issues.
If there is no server listening, and a connection cannot be created, we
fall back to using SSH.